Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 5570 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 989.4 KiB |
| Average record size in memory | 181.9 B |
Variable types
| Numeric | 12 |
|---|---|
| Categorical | 1 |
Município has a high cardinality: 5570 distinct values | High cardinality |
CV_HEPatite_B is highly correlated with CV_HIB and 8 other fields | High correlation |
CV_HIB is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
CV_DPT is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
CV_POLIO is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
CV_Pneumo is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
CV_MncC is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
CV_SCR1 is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
CV_SCR2 is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
CV_VARICELA is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
CV_HEPatite_A is highly correlated with CV_HEPatite_B and 8 other fields | High correlation |
Município is uniformly distributed | Uniform |
COD has unique values | Unique |
Município has unique values | Unique |
CV_BCG has 326 (5.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-11-09 00:43:30.167382 |
|---|---|
| Analysis finished | 2022-11-09 00:43:50.394813 |
| Duration | 20.23 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 5570 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 325358.6278 |
| Minimum | 110001 |
|---|---|
| Maximum | 530010 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 110001 |
|---|---|
| 5-th percentile | 150777.25 |
| Q1 | 251212.5 |
| median | 314627.5 |
| Q3 | 411918.75 |
| 95-th percentile | 510729.55 |
| Maximum | 530010 |
| Range | 420009 |
| Interquartile range (IQR) | 160706.25 |
Descriptive statistics
| Standard deviation | 98491.03388 |
|---|---|
| Coefficient of variation (CV) | 0.3027152977 |
| Kurtosis | -0.5258091553 |
| Mean | 325358.6278 |
| Median Absolute Deviation (MAD) | 74152.5 |
| Skewness | 0.1213411839 |
| Sum | 1812247557 |
| Variance | 9700483754 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 110001 | 1 | < 0.1% |
| 353970 | 1 | < 0.1% |
| 354040 | 1 | < 0.1% |
| 354030 | 1 | < 0.1% |
| 354025 | 1 | < 0.1% |
| 354020 | 1 | < 0.1% |
| 354010 | 1 | < 0.1% |
| 354000 | 1 | < 0.1% |
| 353990 | 1 | < 0.1% |
| 353980 | 1 | < 0.1% |
| Other values (5560) | 5560 |
| Value | Count | Frequency (%) |
| 110001 | 1 | |
| 110002 | 1 | |
| 110003 | 1 | |
| 110004 | 1 | |
| 110005 | 1 | |
| 110006 | 1 | |
| 110007 | 1 | |
| 110008 | 1 | |
| 110009 | 1 | |
| 110010 | 1 |
| Value | Count | Frequency (%) |
| 530010 | 1 | |
| 522230 | 1 | |
| 522220 | 1 | |
| 522205 | 1 | |
| 522200 | 1 | |
| 522190 | 1 | |
| 522185 | 1 | |
| 522180 | 1 | |
| 522170 | 1 | |
| 522160 | 1 |
| Distinct | 5570 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 467.3 KiB |
| 110001 Alta Floresta D'Oeste | 1 |
|---|---|
| 353970 Platina | 1 |
| 354040 Populina | 1 |
| 354030 Pontes Gestal | 1 |
| 354025 Pontalinda | 1 |
| Other values (5565) |
Length
| Max length | 39 |
|---|---|
| Median length | 34 |
| Mean length | 18.61059246 |
| Min length | 10 |
Characters and Unicode
| Total characters | 103661 |
|---|---|
| Distinct characters | 80 |
| Distinct categories | 6 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 2 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 5570 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 110001 Alta Floresta D'Oeste |
|---|---|
| 2nd row | 110002 Ariquemes |
| 3rd row | 110003 Cabixi |
| 4th row | 110004 Cacoal |
| 5th row | 110005 Cerejeiras |
Common Values
| Value | Count | Frequency (%) |
| 110001 Alta Floresta D'Oeste | 1 | < 0.1% |
| 353970 Platina | 1 | < 0.1% |
| 354040 Populina | 1 | < 0.1% |
| 354030 Pontes Gestal | 1 | < 0.1% |
| 354025 Pontalinda | 1 | < 0.1% |
| 354020 Pontal | 1 | < 0.1% |
| 354010 Pongaí | 1 | < 0.1% |
| 354000 Pompéia | 1 | < 0.1% |
| 353990 Poloni | 1 | < 0.1% |
| 353980 Poá | 1 | < 0.1% |
| Other values (5560) | 5560 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| do | 756 | 4.8% |
| são | 364 | 2.3% |
| de | 302 | 1.9% |
| santa | 161 | 1.0% |
| da | 143 | 0.9% |
| nova | 135 | 0.9% |
| sul | 115 | 0.7% |
| rio | 94 | 0.6% |
| dos | 73 | 0.5% |
| josé | 70 | 0.4% |
| Other values (9533) | 13640 |
Most occurring characters
| Value | Count | Frequency (%) |
| 10283 | 9.9% | |
| a | 8791 | 8.5% |
| 0 | 8160 | 7.9% |
| o | 5961 | 5.8% |
| 1 | 4774 | 4.6% |
| 2 | 4591 | 4.4% |
| r | 4532 | 4.4% |
| i | 4388 | 4.2% |
| 3 | 4106 | 4.0% |
| e | 3764 | 3.6% |
| Other values (70) | 44311 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 50872 | |
| Decimal Number | 33420 | |
| Space Separator | 10283 | 9.9% |
| Uppercase Letter | 9010 | 8.7% |
| Other Punctuation | 47 | < 0.1% |
| Dash Punctuation | 29 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 8791 | |
| o | 5961 | |
| r | 4532 | |
| i | 4388 | |
| e | 3764 | 7.4% |
| n | 3196 | 6.3% |
| d | 2553 | 5.0% |
| s | 2423 | 4.8% |
| t | 2293 | 4.5% |
| u | 2155 | 4.2% |
| Other values (27) | 10816 |
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 1137 | |
| C | 970 | |
| P | 911 | 10.1% |
| M | 721 | 8.0% |
| A | 698 | 7.7% |
| B | 602 | 6.7% |
| I | 475 | 5.3% |
| J | 405 | 4.5% |
| G | 391 | 4.3% |
| R | 367 | 4.1% |
| Other values (20) | 2333 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 8160 | |
| 1 | 4774 | |
| 2 | 4591 | |
| 3 | 4106 | |
| 5 | 3654 | |
| 4 | 2781 | 8.3% |
| 7 | 1470 | 4.4% |
| 6 | 1422 | 4.3% |
| 9 | 1382 | 4.1% |
| 8 | 1080 | 3.2% |
Space Separator
| Value | Count | Frequency (%) |
| 10283 |
Other Punctuation
| Value | Count | Frequency (%) |
| ' | 47 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 29 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 59882 | |
| Common | 43779 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 8791 | |
| o | 5961 | 10.0% |
| r | 4532 | 7.6% |
| i | 4388 | 7.3% |
| e | 3764 | 6.3% |
| n | 3196 | 5.3% |
| d | 2553 | 4.3% |
| s | 2423 | 4.0% |
| t | 2293 | 3.8% |
| u | 2155 | 3.6% |
| Other values (57) | 19826 |
Common
| Value | Count | Frequency (%) |
| 10283 | ||
| 0 | 8160 | |
| 1 | 4774 | |
| 2 | 4591 | |
| 3 | 4106 | 9.4% |
| 5 | 3654 | 8.3% |
| 4 | 2781 | 6.4% |
| 7 | 1470 | 3.4% |
| 6 | 1422 | 3.2% |
| 9 | 1382 | 3.2% |
| Other values (3) | 1156 | 2.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 100822 | |
| None | 2839 | 2.7% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 10283 | 10.2% | |
| a | 8791 | 8.7% |
| 0 | 8160 | 8.1% |
| o | 5961 | 5.9% |
| 1 | 4774 | 4.7% |
| 2 | 4591 | 4.6% |
| r | 4532 | 4.5% |
| i | 4388 | 4.4% |
| 3 | 4106 | 4.1% |
| e | 3764 | 3.7% |
| Other values (54) | 41472 |
None
| Value | Count | Frequency (%) |
| ã | 794 | |
| á | 393 | |
| í | 336 | |
| é | 317 | 11.2% |
| ç | 268 | 9.4% |
| ó | 243 | 8.6% |
| â | 161 | 5.7% |
| ú | 101 | 3.6% |
| ô | 71 | 2.5% |
| ê | 70 | 2.5% |
| Other values (6) | 85 | 3.0% |
| Distinct | 3664 |
|---|---|
| Distinct (%) | 65.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 59.63224417 |
| Minimum | 0 |
|---|---|
| Maximum | 701.31 |
| Zeros | 326 |
| Zeros (%) | 5.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 22.04 |
| median | 59.215 |
| Q3 | 90.91 |
| 95-th percentile | 122.0005 |
| Maximum | 701.31 |
| Range | 701.31 |
| Interquartile range (IQR) | 68.87 |
Descriptive statistics
| Standard deviation | 44.36443295 |
|---|---|
| Coefficient of variation (CV) | 0.7439671871 |
| Kurtosis | 16.05381678 |
| Mean | 59.63224417 |
| Median Absolute Deviation (MAD) | 34.215 |
| Skewness | 1.753687951 |
| Sum | 332151.6 |
| Variance | 1968.202911 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 326 | 5.9% |
| 100 | 47 | 0.8% |
| 50 | 20 | 0.4% |
| 33.33 | 16 | 0.3% |
| 25 | 14 | 0.3% |
| 66.67 | 12 | 0.2% |
| 14.29 | 11 | 0.2% |
| 87.5 | 11 | 0.2% |
| 12.5 | 11 | 0.2% |
| 60 | 10 | 0.2% |
| Other values (3654) | 5092 |
| Value | Count | Frequency (%) |
| 0 | 326 | |
| 0.2 | 1 | < 0.1% |
| 0.21 | 1 | < 0.1% |
| 0.24 | 1 | < 0.1% |
| 0.28 | 1 | < 0.1% |
| 0.29 | 1 | < 0.1% |
| 0.34 | 1 | < 0.1% |
| 0.4 | 1 | < 0.1% |
| 0.42 | 1 | < 0.1% |
| 0.43 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 701.31 | 1 | |
| 630.22 | 1 | |
| 469.31 | 1 | |
| 437.83 | 1 | |
| 409.2 | 1 | |
| 334.31 | 1 | |
| 326.92 | 1 | |
| 316.46 | 1 | |
| 309.76 | 1 | |
| 284.23 | 1 |
| Distinct | 3570 |
|---|---|
| Distinct (%) | 64.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.80218851 |
| Minimum | 0 |
|---|---|
| Maximum | 510 |
| Zeros | 5 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 40.3065 |
| Q1 | 74.58 |
| median | 94.58 |
| Q3 | 112 |
| 95-th percentile | 143.5405 |
| Maximum | 510 |
| Range | 510 |
| Interquartile range (IQR) | 37.42 |
Descriptive statistics
| Standard deviation | 32.06833416 |
|---|---|
| Coefficient of variation (CV) | 0.3418719186 |
| Kurtosis | 6.924718488 |
| Mean | 93.80218851 |
| Median Absolute Deviation (MAD) | 18.46 |
| Skewness | 0.7275547686 |
| Sum | 522478.19 |
| Variance | 1028.378056 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 85 | 1.5% |
| 150 | 18 | 0.3% |
| 125 | 15 | 0.3% |
| 116.67 | 15 | 0.3% |
| 120 | 14 | 0.3% |
| 112.5 | 14 | 0.3% |
| 66.67 | 13 | 0.2% |
| 109.09 | 13 | 0.2% |
| 133.33 | 13 | 0.2% |
| 128.57 | 12 | 0.2% |
| Other values (3560) | 5358 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 0.3 | 1 | < 0.1% |
| 0.75 | 1 | < 0.1% |
| 3.32 | 1 | < 0.1% |
| 4.76 | 1 | < 0.1% |
| 4.78 | 1 | < 0.1% |
| 4.84 | 1 | < 0.1% |
| 5.42 | 1 | < 0.1% |
| 6.13 | 1 | < 0.1% |
| 7.09 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 510 | 1 | < 0.1% |
| 338.46 | 1 | < 0.1% |
| 300 | 1 | < 0.1% |
| 266.4 | 1 | < 0.1% |
| 257.14 | 2 | |
| 253.85 | 1 | < 0.1% |
| 250 | 1 | < 0.1% |
| 244.83 | 1 | < 0.1% |
| 225 | 3 | |
| 221.74 | 1 | < 0.1% |
| Distinct | 3592 |
|---|---|
| Distinct (%) | 64.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.78474686 |
| Minimum | 0 |
|---|---|
| Maximum | 510 |
| Zeros | 5 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 40.239 |
| Q1 | 74.3875 |
| median | 94.535 |
| Q3 | 112.025 |
| 95-th percentile | 143.6175 |
| Maximum | 510 |
| Range | 510 |
| Interquartile range (IQR) | 37.6375 |
Descriptive statistics
| Standard deviation | 32.12515672 |
|---|---|
| Coefficient of variation (CV) | 0.3425413812 |
| Kurtosis | 6.830354754 |
| Mean | 93.78474686 |
| Median Absolute Deviation (MAD) | 18.545 |
| Skewness | 0.7225574621 |
| Sum | 522381.04 |
| Variance | 1032.025694 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 84 | 1.5% |
| 150 | 18 | 0.3% |
| 120 | 17 | 0.3% |
| 66.67 | 15 | 0.3% |
| 125 | 14 | 0.3% |
| 112.5 | 14 | 0.3% |
| 116.67 | 13 | 0.2% |
| 133.33 | 12 | 0.2% |
| 83.33 | 12 | 0.2% |
| 88.89 | 12 | 0.2% |
| Other values (3582) | 5359 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 0.3 | 1 | < 0.1% |
| 0.75 | 1 | < 0.1% |
| 2.97 | 1 | < 0.1% |
| 4.76 | 1 | < 0.1% |
| 4.78 | 1 | < 0.1% |
| 4.84 | 1 | < 0.1% |
| 5.42 | 1 | < 0.1% |
| 6.13 | 1 | < 0.1% |
| 7.09 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 510 | 1 | |
| 330.77 | 1 | |
| 300 | 1 | |
| 266.73 | 1 | |
| 257.14 | 2 | |
| 253.85 | 1 | |
| 250 | 1 | |
| 244.83 | 1 | |
| 241.67 | 1 | |
| 225 | 2 |
| Distinct | 3639 |
|---|---|
| Distinct (%) | 65.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.9977684 |
| Minimum | 0 |
|---|---|
| Maximum | 510 |
| Zeros | 5 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 40.323 |
| Q1 | 74.8425 |
| median | 94.655 |
| Q3 | 112.155 |
| 95-th percentile | 143.6455 |
| Maximum | 510 |
| Range | 510 |
| Interquartile range (IQR) | 37.3125 |
Descriptive statistics
| Standard deviation | 32.10889569 |
|---|---|
| Coefficient of variation (CV) | 0.3415921063 |
| Kurtosis | 6.834642154 |
| Mean | 93.9977684 |
| Median Absolute Deviation (MAD) | 18.52 |
| Skewness | 0.7223296629 |
| Sum | 523567.57 |
| Variance | 1030.981183 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 88 | 1.6% |
| 120 | 18 | 0.3% |
| 150 | 17 | 0.3% |
| 128.57 | 15 | 0.3% |
| 125 | 15 | 0.3% |
| 66.67 | 15 | 0.3% |
| 116.67 | 15 | 0.3% |
| 133.33 | 14 | 0.3% |
| 112.5 | 14 | 0.3% |
| 114.29 | 12 | 0.2% |
| Other values (3629) | 5347 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 0.3 | 1 | < 0.1% |
| 0.75 | 1 | < 0.1% |
| 3.32 | 1 | < 0.1% |
| 4.76 | 1 | < 0.1% |
| 4.78 | 1 | < 0.1% |
| 4.84 | 1 | < 0.1% |
| 5.42 | 1 | < 0.1% |
| 6.13 | 1 | < 0.1% |
| 7.09 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 510 | 1 | |
| 330.77 | 1 | |
| 300 | 1 | |
| 267.73 | 1 | |
| 257.14 | 2 | |
| 253.85 | 1 | |
| 250 | 1 | |
| 244.83 | 1 | |
| 241.67 | 1 | |
| 225 | 2 |
| Distinct | 3473 |
|---|---|
| Distinct (%) | 62.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 91.73294434 |
| Minimum | 0 |
|---|---|
| Maximum | 500 |
| Zeros | 4 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 44.4235 |
| Q1 | 75.8425 |
| median | 92.34 |
| Q3 | 106.935 |
| 95-th percentile | 134.5345 |
| Maximum | 500 |
| Range | 500 |
| Interquartile range (IQR) | 31.0925 |
Descriptive statistics
| Standard deviation | 28.39977273 |
|---|---|
| Coefficient of variation (CV) | 0.3095918586 |
| Kurtosis | 12.71882021 |
| Mean | 91.73294434 |
| Median Absolute Deviation (MAD) | 15.515 |
| Skewness | 1.084955645 |
| Sum | 510952.5 |
| Variance | 806.5470913 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 113 | 2.0% |
| 88.89 | 17 | 0.3% |
| 66.67 | 16 | 0.3% |
| 116.67 | 15 | 0.3% |
| 90 | 15 | 0.3% |
| 133.33 | 14 | 0.3% |
| 120 | 14 | 0.3% |
| 125 | 13 | 0.2% |
| 128.57 | 13 | 0.2% |
| 111.11 | 12 | 0.2% |
| Other values (3463) | 5328 |
| Value | Count | Frequency (%) |
| 0 | 4 | |
| 0.59 | 1 | < 0.1% |
| 1.13 | 1 | < 0.1% |
| 4.76 | 1 | < 0.1% |
| 5.08 | 1 | < 0.1% |
| 5.22 | 1 | < 0.1% |
| 5.36 | 1 | < 0.1% |
| 5.93 | 1 | < 0.1% |
| 7.14 | 1 | < 0.1% |
| 7.19 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 500 | 1 | |
| 423.08 | 1 | |
| 291.43 | 1 | |
| 262.16 | 1 | |
| 257.14 | 1 | |
| 250 | 1 | |
| 246.43 | 1 | |
| 243.75 | 1 | |
| 232 | 1 | |
| 231.03 | 1 |
| Distinct | 3400 |
|---|---|
| Distinct (%) | 61.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 95.53718851 |
| Minimum | 0 |
|---|---|
| Maximum | 440 |
| Zeros | 5 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 50.197 |
| Q1 | 81.25 |
| median | 96.295 |
| Q3 | 109.91 |
| 95-th percentile | 136.2835 |
| Maximum | 440 |
| Range | 440 |
| Interquartile range (IQR) | 28.66 |
Descriptive statistics
| Standard deviation | 27.26761444 |
|---|---|
| Coefficient of variation (CV) | 0.2854136161 |
| Kurtosis | 10.67271196 |
| Mean | 95.53718851 |
| Median Absolute Deviation (MAD) | 14.3 |
| Skewness | 0.8332290648 |
| Sum | 532142.14 |
| Variance | 743.5227973 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 112 | 2.0% |
| 111.11 | 22 | 0.4% |
| 125 | 20 | 0.4% |
| 120 | 18 | 0.3% |
| 133.33 | 16 | 0.3% |
| 112.5 | 14 | 0.3% |
| 150 | 14 | 0.3% |
| 80 | 12 | 0.2% |
| 75 | 12 | 0.2% |
| 87.5 | 12 | 0.2% |
| Other values (3390) | 5318 |
| Value | Count | Frequency (%) |
| 0 | 5 | |
| 0.75 | 1 | < 0.1% |
| 4.75 | 1 | < 0.1% |
| 5.54 | 1 | < 0.1% |
| 5.65 | 1 | < 0.1% |
| 6.21 | 1 | < 0.1% |
| 6.44 | 1 | < 0.1% |
| 6.96 | 1 | < 0.1% |
| 7.06 | 1 | < 0.1% |
| 8.48 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 440 | 1 | |
| 430.77 | 1 | |
| 300 | 1 | |
| 280.63 | 1 | |
| 237.5 | 1 | |
| 235 | 1 | |
| 228.57 | 2 | |
| 212.82 | 1 | |
| 212 | 1 | |
| 206.25 | 1 |
| Distinct | 3417 |
|---|---|
| Distinct (%) | 61.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.35078456 |
| Minimum | 0 |
|---|---|
| Maximum | 430 |
| Zeros | 4 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 47.8775 |
| Q1 | 78.59 |
| median | 94.12 |
| Q3 | 107.55 |
| 95-th percentile | 134.7305 |
| Maximum | 430 |
| Range | 430 |
| Interquartile range (IQR) | 28.96 |
Descriptive statistics
| Standard deviation | 27.499909 |
|---|---|
| Coefficient of variation (CV) | 0.294586801 |
| Kurtosis | 10.99428273 |
| Mean | 93.35078456 |
| Median Absolute Deviation (MAD) | 14.495 |
| Skewness | 0.9682587377 |
| Sum | 519963.87 |
| Variance | 756.2449949 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 111 | 2.0% |
| 88.89 | 18 | 0.3% |
| 133.33 | 18 | 0.3% |
| 108.33 | 16 | 0.3% |
| 83.33 | 16 | 0.3% |
| 112.5 | 14 | 0.3% |
| 66.67 | 14 | 0.3% |
| 120 | 14 | 0.3% |
| 110 | 13 | 0.2% |
| 116.67 | 12 | 0.2% |
| Other values (3407) | 5324 |
| Value | Count | Frequency (%) |
| 0 | 4 | |
| 0.3 | 1 | < 0.1% |
| 0.75 | 1 | < 0.1% |
| 3.26 | 1 | < 0.1% |
| 3.57 | 1 | < 0.1% |
| 5.29 | 1 | < 0.1% |
| 5.76 | 1 | < 0.1% |
| 6.06 | 1 | < 0.1% |
| 6.63 | 1 | < 0.1% |
| 7.64 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 430 | 1 | |
| 423.08 | 1 | |
| 350 | 1 | |
| 320 | 1 | |
| 263.59 | 1 | |
| 228.57 | 1 | |
| 225 | 1 | |
| 215.38 | 1 | |
| 212 | 1 | |
| 210.71 | 1 |
| Distinct | 3538 |
|---|---|
| Distinct (%) | 63.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 93.74959425 |
| Minimum | 0 |
|---|---|
| Maximum | 341.67 |
| Zeros | 4 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 45.2455 |
| Q1 | 77.5775 |
| median | 95.035 |
| Q3 | 109.52 |
| 95-th percentile | 137.5 |
| Maximum | 341.67 |
| Range | 341.67 |
| Interquartile range (IQR) | 31.9425 |
Descriptive statistics
| Standard deviation | 29.0318525 |
|---|---|
| Coefficient of variation (CV) | 0.3096744336 |
| Kurtosis | 4.270358906 |
| Mean | 93.74959425 |
| Median Absolute Deviation (MAD) | 15.885 |
| Skewness | 0.546428078 |
| Sum | 522185.24 |
| Variance | 842.8484596 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 106 | 1.9% |
| 114.29 | 19 | 0.3% |
| 150 | 19 | 0.3% |
| 125 | 17 | 0.3% |
| 116.67 | 17 | 0.3% |
| 91.67 | 16 | 0.3% |
| 66.67 | 16 | 0.3% |
| 120 | 15 | 0.3% |
| 133.33 | 14 | 0.3% |
| 75 | 13 | 0.2% |
| Other values (3528) | 5318 |
| Value | Count | Frequency (%) |
| 0 | 4 | |
| 1.37 | 1 | < 0.1% |
| 2.67 | 1 | < 0.1% |
| 3.39 | 1 | < 0.1% |
| 4.27 | 1 | < 0.1% |
| 4.39 | 1 | < 0.1% |
| 4.56 | 1 | < 0.1% |
| 6.89 | 1 | < 0.1% |
| 7.63 | 1 | < 0.1% |
| 8.25 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 341.67 | 1 | |
| 316.67 | 1 | |
| 284.21 | 1 | |
| 277.27 | 1 | |
| 275 | 1 | |
| 273.08 | 1 | |
| 269.23 | 1 | |
| 251.98 | 1 | |
| 250 | 2 | |
| 242.42 | 1 |
| Distinct | 3732 |
|---|---|
| Distinct (%) | 67.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 75.77722442 |
| Minimum | 0 |
|---|---|
| Maximum | 525 |
| Zeros | 38 |
| Zeros (%) | 0.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 20.51 |
| Q1 | 53.5 |
| median | 76.13 |
| Q3 | 96.245 |
| 95-th percentile | 128.4355 |
| Maximum | 525 |
| Range | 525 |
| Interquartile range (IQR) | 42.745 |
Descriptive statistics
| Standard deviation | 33.79979589 |
|---|---|
| Coefficient of variation (CV) | 0.4460416194 |
| Kurtosis | 6.924374083 |
| Mean | 75.77722442 |
| Median Absolute Deviation (MAD) | 21.26 |
| Skewness | 0.792160371 |
| Sum | 422079.14 |
| Variance | 1142.426202 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 73 | 1.3% |
| 0 | 38 | 0.7% |
| 50 | 34 | 0.6% |
| 75 | 16 | 0.3% |
| 114.29 | 14 | 0.3% |
| 80 | 14 | 0.3% |
| 120 | 14 | 0.3% |
| 66.67 | 14 | 0.3% |
| 85.71 | 13 | 0.2% |
| 60 | 13 | 0.2% |
| Other values (3722) | 5327 |
| Value | Count | Frequency (%) |
| 0 | 38 | |
| 0.44 | 1 | < 0.1% |
| 0.68 | 1 | < 0.1% |
| 1.39 | 1 | < 0.1% |
| 1.57 | 1 | < 0.1% |
| 1.69 | 1 | < 0.1% |
| 1.96 | 1 | < 0.1% |
| 2.13 | 1 | < 0.1% |
| 2.14 | 1 | < 0.1% |
| 2.15 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 525 | 1 | |
| 346.15 | 1 | |
| 272.73 | 1 | |
| 250 | 1 | |
| 240 | 1 | |
| 229.41 | 1 | |
| 225 | 1 | |
| 223.33 | 1 | |
| 216.67 | 1 | |
| 214.29 | 1 |
| Distinct | 3629 |
|---|---|
| Distinct (%) | 65.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 88.3534219 |
| Minimum | 0 |
|---|---|
| Maximum | 325 |
| Zeros | 9 |
| Zeros (%) | 0.2% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 36.8845 |
| Q1 | 69.175 |
| median | 88.45 |
| Q3 | 105.88 |
| 95-th percentile | 138.89 |
| Maximum | 325 |
| Range | 325 |
| Interquartile range (IQR) | 36.705 |
Descriptive statistics
| Standard deviation | 31.5377524 |
|---|---|
| Coefficient of variation (CV) | 0.3569499825 |
| Kurtosis | 2.784775776 |
| Mean | 88.3534219 |
| Median Absolute Deviation (MAD) | 18.305 |
| Skewness | 0.5944701833 |
| Sum | 492128.56 |
| Variance | 994.6298266 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 74 | 1.3% |
| 80 | 20 | 0.4% |
| 85.71 | 19 | 0.3% |
| 110 | 15 | 0.3% |
| 83.33 | 15 | 0.3% |
| 120 | 14 | 0.3% |
| 66.67 | 13 | 0.2% |
| 116.67 | 13 | 0.2% |
| 125 | 12 | 0.2% |
| 87.5 | 12 | 0.2% |
| Other values (3619) | 5363 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 0.33 | 1 | < 0.1% |
| 2.83 | 1 | < 0.1% |
| 4.67 | 1 | < 0.1% |
| 4.76 | 1 | < 0.1% |
| 4.88 | 1 | < 0.1% |
| 5.53 | 1 | < 0.1% |
| 5.56 | 1 | < 0.1% |
| 5.95 | 1 | < 0.1% |
| 6.06 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 325 | 1 | |
| 296.15 | 1 | |
| 282.5 | 1 | |
| 270.59 | 1 | |
| 266.67 | 1 | |
| 263.64 | 1 | |
| 254.55 | 1 | |
| 250 | 1 | |
| 241.18 | 1 | |
| 234.16 | 1 |
| Distinct | 3582 |
|---|---|
| Distinct (%) | 64.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 89.94414004 |
| Minimum | 0 |
|---|---|
| Maximum | 375 |
| Zeros | 8 |
| Zeros (%) | 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 43.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 39.108 |
| Q1 | 72.625 |
| median | 91.215 |
| Q3 | 106.85 |
| 95-th percentile | 136.36 |
| Maximum | 375 |
| Range | 375 |
| Interquartile range (IQR) | 34.225 |
Descriptive statistics
| Standard deviation | 30.47236068 |
|---|---|
| Coefficient of variation (CV) | 0.3387920621 |
| Kurtosis | 5.20248621 |
| Mean | 89.94414004 |
| Median Absolute Deviation (MAD) | 17.105 |
| Skewness | 0.6250970379 |
| Sum | 500988.86 |
| Variance | 928.5647653 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 104 | 1.9% |
| 125 | 17 | 0.3% |
| 120 | 17 | 0.3% |
| 80 | 17 | 0.3% |
| 114.29 | 15 | 0.3% |
| 118.18 | 14 | 0.3% |
| 107.69 | 13 | 0.2% |
| 87.5 | 13 | 0.2% |
| 50 | 12 | 0.2% |
| 150 | 12 | 0.2% |
| Other values (3572) | 5336 |
| Value | Count | Frequency (%) |
| 0 | 8 | |
| 0.33 | 1 | < 0.1% |
| 0.67 | 1 | < 0.1% |
| 0.68 | 1 | < 0.1% |
| 1.22 | 1 | < 0.1% |
| 2.67 | 1 | < 0.1% |
| 3.39 | 1 | < 0.1% |
| 4.56 | 1 | < 0.1% |
| 4.76 | 1 | < 0.1% |
| 5.38 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 375 | 1 | |
| 338.46 | 1 | |
| 336.36 | 1 | |
| 325 | 1 | |
| 308.33 | 1 | |
| 286.67 | 1 | |
| 254.17 | 1 | |
| 241.67 | 1 | |
| 241.18 | 1 | |
| 226.67 | 1 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| COD | Município | CV_BCG | CV_HEPatite_B | CV_HIB | CV_DPT | CV_POLIO | CV_Pneumo | CV_MncC | CV_SCR1 | CV_SCR2 | CV_VARICELA | CV_HEPatite_A | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 110001 | 110001 Alta Floresta D'Oeste | 86.49 | 112.01 | 112.01 | 112.31 | 104.80 | 108.71 | 105.71 | 100.82 | 79.40 | 91.21 | 95.60 |
| 1 | 110002 | 110002 Ariquemes | 102.06 | 104.45 | 104.58 | 104.58 | 97.28 | 100.20 | 95.36 | 92.11 | 73.00 | 82.88 | 86.09 |
| 2 | 110003 | 110003 Cabixi | 0.00 | 126.09 | 126.09 | 127.54 | 120.29 | 117.39 | 107.25 | 105.13 | 87.18 | 92.31 | 87.18 |
| 3 | 110004 | 110004 Cacoal | 104.83 | 103.40 | 103.47 | 103.47 | 91.85 | 96.45 | 93.21 | 105.41 | 60.06 | 87.56 | 91.37 |
| 4 | 110005 | 110005 Cerejeiras | 69.52 | 109.29 | 109.29 | 109.67 | 108.55 | 111.90 | 106.69 | 112.88 | 100.43 | 106.01 | 109.01 |
| 5 | 110006 | 110006 Colorado do Oeste | 16.75 | 124.63 | 125.62 | 125.12 | 112.81 | 118.72 | 115.27 | 110.33 | 98.59 | 94.84 | 104.69 |
| 6 | 110007 | 110007 Corumbiara | 21.50 | 132.71 | 132.71 | 132.71 | 135.51 | 114.95 | 119.63 | 114.17 | 77.95 | 97.64 | 101.57 |
| 7 | 110008 | 110008 Costa Marques | 83.68 | 115.26 | 115.26 | 115.26 | 107.37 | 114.21 | 107.89 | 90.75 | 96.04 | 94.27 | 99.56 |
| 8 | 110009 | 110009 Espigão D'Oeste | 88.96 | 96.18 | 96.18 | 96.18 | 84.50 | 99.58 | 94.48 | 84.16 | 58.37 | 89.59 | 89.37 |
| 9 | 110010 | 110010 Guajará-Mirim | 50.58 | 53.50 | 53.50 | 53.61 | 50.93 | 57.11 | 56.18 | 54.59 | 39.20 | 49.29 | 49.42 |
Last rows
| COD | Município | CV_BCG | CV_HEPatite_B | CV_HIB | CV_DPT | CV_POLIO | CV_Pneumo | CV_MncC | CV_SCR1 | CV_SCR2 | CV_VARICELA | CV_HEPatite_A | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5560 | 522160 | 522160 Uruaçu | 96.23 | 67.74 | 67.74 | 67.74 | 80.19 | 85.66 | 78.30 | 72.59 | 56.72 | 66.90 | 74.83 |
| 5561 | 522170 | 522170 Uruana | 32.59 | 42.96 | 42.96 | 42.96 | 51.11 | 65.19 | 59.26 | 60.93 | 61.59 | 60.93 | 62.91 |
| 5562 | 522180 | 522180 Urutaí | 70.83 | 129.17 | 129.17 | 129.17 | 125.00 | 133.33 | 91.67 | 275.00 | 216.67 | 216.67 | 308.33 |
| 5563 | 522185 | 522185 Valparaíso de Goiás | 44.15 | 68.44 | 68.44 | 68.72 | 68.81 | 76.31 | 73.12 | 62.48 | 60.02 | 58.61 | 67.85 |
| 5564 | 522190 | 522190 Varjão | 113.79 | 244.83 | 244.83 | 244.83 | 231.03 | 155.17 | 186.21 | 223.33 | 196.67 | 220.00 | 203.33 |
| 5565 | 522200 | 522200 Vianópolis | 105.81 | 107.56 | 106.98 | 106.98 | 122.67 | 129.07 | 124.42 | 117.95 | 85.13 | 85.64 | 113.33 |
| 5566 | 522205 | 522205 Vicentinópolis | 51.08 | 91.37 | 91.37 | 91.37 | 77.70 | 74.10 | 74.10 | 66.97 | 69.72 | 69.72 | 81.65 |
| 5567 | 522220 | 522220 Vila Boa | 45.10 | 131.37 | 131.37 | 131.37 | 123.53 | 101.96 | 115.69 | 142.86 | 89.80 | 112.24 | 124.49 |
| 5568 | 522230 | 522230 Vila Propício | 41.51 | 90.57 | 90.57 | 90.57 | 88.68 | 122.64 | 118.87 | 80.60 | 73.13 | 73.13 | 83.58 |
| 5569 | 530010 | 530010 Brasília | 105.15 | 100.80 | 101.27 | 101.36 | 92.30 | 96.98 | 94.00 | 84.40 | 72.56 | 82.04 | 85.27 |